A new article created using the Distill format.
In this section, we will employ appropriate visually driven data analysis techniques to answer the questions in the challenge. We will also explore the various packages required to build the plots. The criteria for selection of plots are as follows: - Level of customization - Ease of use and implementation of customization - Ease of understanding and interpretation of the plot- both clarity and aesthetic - Interactivity
Based on the interactive charts below, the top 3 most popular locations by transaction volume are listed below:
1 | Katerina’s Cafe | 01/11/2014
2 | Hippokampos | 01//2014
3 | Guy’s Gyros | 01//2014
1 | Katerina’s Cafe | 01/11/2014
2 | Hippokampos | 01//2014
3 | Guy’s Gyros | 01//2014
When comparing both datasets, I also noted that there are differences in the transaction count on the loyalty card and credit card. In particular, there were days where loyalty card transactions were higher than credit card transactions. This is unexpected as loyalty card is used to collect discounts and rewards and cannot be used for payment. Hence one would expect both volumes to either be the same or for credit card volumes (actual purchase) to be higher than loyalty card volumes (in cases where the employee may have forgotten to present loyalty card for rewards/ discounts). The difference in volumes each day across both cards are illustrated below.
We will then analyze the transaction volume by day of week to observe volume trends across the week.
1 | Katerina’s Cafe | Tue, Thu, Sat 2 | Hippokampos | Mon, Wed, Thu 3 | Guy’s Gyros | Mon, Thu, Fri
Given that we are provided with credit card timestamp information, we will take one step further to analyze the volume of credit card transactions by location and time.
1 | Katerina’s Cafe | Mon, Tue, Sat | 1700-2000 across weekday and weekends 2 | Hippokampos | Mon, Wed, Thu | Most popular on 1300-1600 on weekdays and 1700-2000 on weekends 3 | Guy’s Gyros | Mon, Thu, Fri | 1700-2000 across weekday and weekends except for Friday where 1300-1600 is most popular
[1] "English_United States.1252"
As observed above, popular day of week differs for some of the locations such as Katerina’s Cafe. This is unexpected as we would expect the trends to be similar for both cards.
Also, based on the timestamp of credit card transactions, we noted that all of the transactions in “Bean There Done That”, “Brewed Awakenings”, “Coffee Shack” and “Jack’s Magical Beans” during the period of transactions- 12:00pm. It is highly unlikely that all transactions in these locations are transacted at the same time. Hence, the timestamp for these transactions may be incorrect.
Given that these timestamps may not be representative of the actual transaction time, we will not be using this information for further analysis subsequently.
Furthermore, we also noted that there are several transactions in Kronos Mart at 3am on 13 January and 19 January. This is highly unusual and warrants further investigation.
Other anomalies noted from the data are as follows:
Assuming that the car assignment list provided includes all employees, we noted that there are 44 distinct employees. However, we noted that there are 55 distinct credit card numbers and 54 distinct loyalty card numbers. This is unusual as each employee should have been issued a loyalty card and hence we would expect number of distinct credit cards, loyalty cards and employee count to match.
More investigation should be made into this discrepancy. One explanation could be that employees could have used more than one credit card with their loyalty card. Another explanation could be that there is a new employee who has not received the loyalty card. Given that the employee count is different from number of distinct loyalty cards, we should check with Gastech if there are any employees missing from this list.
From the car assignment dataset provided, we observe that there are nine truck drivers with no ID. This is consistent with what Gastech has explained, which is that employees who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business.
The case scenario does not state which CarIDs are referring to trucks. However, assuming that the 3 digit CarID represents trucks, we only note GPS data for five trucks. There is no evidence as to whether the truck ID is sequential or if each truck driver is assigned to a truck. Given that there are 9 truck drivers and only 5 truck GPS data provided, there is possibility that: 1) Each truck driver is not assigned to a unique truck and trucks can be shared. 2) There are 4 GPS paths missing in the GPS dataset
To perform further investigation on this, we will plot the GPS paths of each carID over the Abila map to identify their route.
Working with geospatial data
Download and launch QGIS, an open-sourced GIS software.
Start a new project by clicking on Project> New.
class : RasterLayer
band : 1 (of 3 bands)
dimensions : 1595, 2706, 4316070 (nrow, ncol, ncell)
resolution : 3.16216e-05, 3.16216e-05 (x, y)
extent : 24.82419, 24.90976, 36.04499, 36.09543 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : MC2-tourist.tif
names : MC2.tourist
values : 0, 255 (min, max)

Reading layer `Abila' from data source
`D:\stellaloh91\Assignment\data\Geospatial' using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS: WGS 84
# A tibble: 685,169 x 10
Timestamp id lat long day date minute
<dttm> <fct> <dbl> <dbl> <int> <date> <int>
1 2014-01-06 06:28:01 35 36.1 24.9 6 2014-01-06 28
2 2014-01-06 06:28:01 35 36.1 24.9 6 2014-01-06 28
3 2014-01-06 06:28:03 35 36.1 24.9 6 2014-01-06 28
4 2014-01-06 06:28:05 35 36.1 24.9 6 2014-01-06 28
5 2014-01-06 06:28:06 35 36.1 24.9 6 2014-01-06 28
6 2014-01-06 06:28:07 35 36.1 24.9 6 2014-01-06 28
7 2014-01-06 06:28:09 35 36.1 24.9 6 2014-01-06 28
8 2014-01-06 06:28:10 35 36.1 24.9 6 2014-01-06 28
9 2014-01-06 06:28:11 35 36.1 24.9 6 2014-01-06 28
10 2014-01-06 06:28:12 35 36.1 24.9 6 2014-01-06 28
# ... with 685,159 more rows, and 3 more variables:
# day_of_week <weekday>, hour <int>, timegroup <fct>
By plotting the GPS coordinates using the Abila tourist map as background, we are able to visualize the path each vehicle is using. The map is also interactive. Clicking on any point in the trajectory allows us to see the CarID, day and timestamp of the respective route. This allows us to match the timestamp and location back to the credit card dataset, hence matching the credit card numbers to their corresponding CarID.
We have used a facet map below to visualize the daily route for CarID 1 across each of the 14 days of GPS data in record. This allows for easy obervation and matching to the credit card data.
After mapping the GPS trajectories, We also noted that there were no GPS data indicating that any car stopped near “Bean There Done That”, “Brewed Awakenings”, “Coffee Shack” and “Jack’s Magical Beans” during the period of transactions- 12:00pm. Hence, these transactions are either incorrectly timed or may even be fraudulent.
Furthermore as mentioned above, we also noted that there are several transactions in Kronos Mart at 3am on 13 January and 19 January. By filtering the map for 3am, we noted that there were no cars near Kronos Mart. Hence, these transactions are either incorrectly timed or may even be fraudulent.